Conducting an objective structured clinical examination under COVID-restricted conditions

Background The administration of performance assessments during the coronavirus disease of 2019 (COVID-19) pandemic posed many challenges, especially for examinations employed as part of certification and licensure. The National Assessment Collaboration (NAC) Examination, an Objective Structured Clinical Examination (OSCE), was modified during the pandemic. The purpose of this study was to gather evidence to support the reliability and validity of the modified NAC Examination. Methods The modified NAC Examination was delivered to 2,433 candidates in 2020 and 2021. Cronbach’s alpha, decision consistency, and accuracy values were calculated. Validity evidence includes comparisons of scores and sub-scores for demographic groups: gender (male vs. female), type of International Medical Graduate (IMG) (Canadians Studying Abroad (CSA) vs. non-CSA), postgraduate training (PGT) (no PGT vs. PGT), and language of examination (English vs. French). Criterion relationships were summarized using correlations within and between the NAC Examination and the Medical Council of Canada Qualifying Examination (MCCQE) Part I scores. Results Reliability estimates were consistent with other OSCEs similar in length and previous NAC Examination administrations. Both total score and sub-score differences for gender were statistically significant. Total score differences by type of IMG and PGT were not statistically significant, but sub-score differences were statistically significant. Administration language was not statistically significant for either the total scores or sub-scores. Correlations were all statistically significant with some relationships being small or moderate (0.20 to 0.40) or large (> 0.40). Conclusions The NAC Examination yields reliable total scores and pass/fail decisions. Expected differences in total scores and sub-scores for defined groups were consistent with previous literature, and internal relationships amongst NAC Examination sub-scores and their external relationships with the MCCQE Part I supported both discriminant and criterion-related validity arguments. Modifications to OSCEs to address health restrictions can be implemented without compromising the overall quality of the assessment. This study outlines some of the validity and reliability analyses for OSCEs that required modifications due to COVID.


Background
Objective Structured Clinical Examinations (OSCEs) date back over five decades [1].An OSCE is a standardized performance assessment, where standardized participants (SPs) interact with candidates on a series of scripted clinical scenarios, called cases or stations [2].These performance-based examinations eventually became a mainstay of clinical skills assessment for certifying and licensing physicians [3].In 1992, the Medical Council of Canada (MCC) introduced the Medical Council of Canada Qualifying Examination (MCCQE) Part II, a pre-requisite for medical licensure in Canada [4].In 2004, passing the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) became a licensure requirement for all United States MD graduates and International Medical Graduates (IMGs) [3].A similar performance assessment requirement for licensure was also introduced for osteopathic medical students in 2004 (COMLEX-USA-PE) [3].Outside of North America, various other high-stakes performance-based assessments were also introduced [5][6][7].These examinations were administered to ensure that graduating medical students, or those eventually seeking unrestricted medical licenses, possessed the skills needed to interact with and manage patients.
Performance-based examinations in medicine can take many forms but are generally constructed to measure data gathering (physical examination, history taking) and communication (with a patient or other healthcare provider).Other skills, including clinical decision-making, written communication, professionalism, and ethical behaviour have also been measured [8][9][10].Scoring of the encounters can be done using checklists or rating scales (e.g., by the SP or an examiner in the room, or via videotape review), or some combination [11].Although there can be considerable variation in the structure and content of OSCEs used for certification and licensure, researchers have provided ample evidence that the scores and pass/fail decisions are reliable and valid [12][13][14].
The psychometric properties of OSCEs and other performance-based assessments are well-described in the literature [15,16].Modeling clinical encounters based on typical reasons for visiting a physician and standardizing the exam administration helps ensure that the score interpretation is valid.If there are enough behavioral samples (i.e., SP interactions), and adequate rater training, reliable estimates of ability can be procured [17].From an extrapolation perspective, several studies linked performance on OSCEs to future practice outcomes, including patient care and disciplinary actions [18].Likewise, there are expected performance differences amongst defined candidate cohorts [19].In support of the validity argument, candidates with more clinical experience and better language skills have higher average scores on some measured constructs [9].For communication skills, including counselling and listening, women have outperformed men.As expected, scores from performance assessments measuring clinical skills were only weakly related to knowledge-based selected response examinations [20].For decisions based on performance-based certification and licensure examinations, standards are typically established via defensible and properly implemented procedures [21].When properly constructed and administered, OSCEs and other performance-based assessments used for credentialing and licensure of physicians can provide valid scores and associated pass/fail decisions [22].

Interruptions in OSCEs during COVID
While national clinical skills assessments operated for over 30 years, the arrival of the coronavirus disease of 2019 (COVID-19) forced many testing organizations to rethink their administration protocols.The USMLE decided to cancel the Step 2CS examination and is attempting to measure some of the relevant clinical skills competencies in the other examinations required for medical licensure in the United States [23].The National Board of Osteopathic Medical Examiners (NBOME) indefinitely suspended the COMLEX-USA-Level 2 Performance Evaluation (PE) and convened a Special Commission on Osteopathic Medical Licensure to investigate how clinical skills could best be measured [24].The Special Commission proposed that, at least temporarily, enhanced attestation of clinical skills by the medical school would suffice.The MCC postponed the MCCQE Part II and attempted to pivot to a virtual format [25].Unfortunately, the short time frame required to reestablish testing, the logistics of administering a virtual clinical skills assessment, and the large number of candidates to be tested, resulted in the MCC abandoning these efforts and cancelling the examination.Overall, the onset of COVID-19 had a drastic impact on performance testing.Given that previous research has indicated deviations to normal procedures can be a threat to score interpretation and the validity arguments for an examination [21,22], it was necessary to validate new processes and procedures.The MCC was able to administer the National Assessment Collaboration (NAC) during this time frame, with modifications to content and delivery, to a large number of candidates.

Background on the NAC examination
In 2011, the MCC introduced the NAC Examination, which is an OSCE [26] with the purpose of assessing the clinical skills of IMGs being selected to enter postgraduate training (PGT) programs across Canada.The NAC Examination is a requirement for IMGs to apply to a Canadian residency program and is used to assist Canadian medical school residency programs in their selection of candidates for PGT [26].

NAC examination Pre-COVID
The NAC Examination consists of 10 different clinical scenarios per test administration and covers a variety of medical scenarios across various systems (e.g., endocrine, reproductive health) and disciplines (e.g., medicine, surgery).Each station has a combination of key feature checklist items, oral questions, and rating scales appropriate to the clinical scenario.Station scores are converted to percentages based on the aggregate item scores, and the total score is the average across the 10 stations.Total scores are adjusted across examination dates based on the difficulty of the set of stations [27] and converted to a reported score between 300 and 500.The NAC Examination was administered in early March 2020 and not disrupted as health restrictions came into effect later that month.

Adjustments to the NAC examination during COVID
In September 2020 the MCC made modifications to the NAC Examination to ensure adherence to public health guidelines.These modifications were implemented for the administrations of the NAC Examination from September 2020 to September 2022 [ 1 ].To modify the NAC Examination, the MCC organized "work streams" which covered all areas of exam development and administration, such as registration, communications, personal protective equipment (PPE) and physical distancing, training, candidate orientations, delivery, content, and psychometrics.The major psychometric change was to reset the score scale and conduct a new standard-setting exercise for September 2020 to ensure that the pass/fail cut score was valid for this modified examination [28].The new score scale was set to 1300 to 1500.

Delivery adjustments
Health Canada guidelines mandated no gatherings of large groups, so MCC adjusted all in-person training and orientation sessions to online modules.The MCC's training and orientation stream created accessible, interactive online learning modules for all participants and, in addition, produced "cheat sheets" of quick reminders that could be delivered to the candidates and the physician examiners on exam day.Registration was done in waves so that not all participants arrived or left at the same time, and catering was delivered to individuals so that participants weren't unmasking and eating in groups.Additionally, there was a reduction of shared touch points, including writing utensils, paper and personal belongings.All participants in the exam were required to use personal protective equipment (PPE), and sites were provided with masks, acrylic barriers, gloves and sanitizers.The exam spaces and materials were sanitized frequently, including doorknobs, chair arms and any laminated documentation the candidates were required to handle.Physical distancing was addressed by restricting the size of groups (administrators, SPs, candidates, physician examiners) and the use of visual reminders.The exam sessions also employed staggered start times to avoid large gatherings at breaks and at departure times.

Content adjustments
This "physically distanced" examination also required the adaptation of the exam content.These adjustments were guided by internal and external subject matter experts (SMEs).Through workshops with SMEs, the content team made iterative adjustments to the case content, scoring checklists and rating scales.They revised physical examination cases to include a "touchless" physical examination, where candidates verbalized their approach and their rationale, rather than demonstrating their skills on SPs.Physician examiners reported relevant findings to the candidate.As a necessary trade-off, the MCC assessed the candidates on their clinical reasoning skills related to physical examination, rather than their performance of the relevant maneuvers.

Objectives of this study
This study aims to provide evidence to support the reliability and validity of the modified in-person NAC Examination with a touchless physical examination, as administered during the COVID-19 pandemic.The types of validity and reliability evidence that should be collected to support modified versions of an assessment are outlined, whether it is for a major modification or an interruption to exam delivery and content.With extensive modifications, the MCC was comfortable that the NAC Examination could be administered in-person and that relevant clinical skills could be assessed.The changes to the delivery and content required a new cut score to be established, as direct comparisons to pre-and post-COVID results cannot be supported.This study outlines outcome measures that should be evaluated when making major modifications to an exam program.

Methods
The NAC Examination candidate study cohort, data sources (NAC Examination, MCCQE Part I), dependent variables (total scores and sub-scores), and independent variables (candidate demographics) are described in this section.

Data sources
We analyzed data from the modified NAC Examination administered from September 2020 through October 2021 [ 2 ].Each candidate can challenge the exam up to 3 times, once per calendar year.If a candidate has a pass on the NAC Examination, they can challenge the exam again as their performance is used for residency selection purposes.The quantitative analyses described below were based on candidates who were attempting the NAC Examination for the first time (n = 2,433).
A new score scale and cut score are established when major delivery, content and scoring changes occur for an examination.Given the extensive content and delivery changes a new score scale and cut score was warranted.A new score scale was established in September 2020 along with a standard-setting exercise to establish a new cut score.Total scores for this score scale were equated across test forms to ensure that the comparison of scores and the cut score were on the same scale and therefore the candidate results during this time frame can be directly compared.Total scores ranged from 1300 to 1500 and sub-scores for Assessment and Diagnosis, Management, and Communication Skills were used for analyses.
The MCCQE Part I scores and subs-scores were used as criterion validity measures.The MCCQE Part I assesses the critical medical knowledge and clinical decisionmaking ability of a candidate at the level of a medical student completing their medical degree in Canada [29].The MCCQE Part I is one of the requirements to obtain the Licentiate of the Medical Council of Canada, a credential that is required for medical license for various provinces and territories in Canada.It is a one-day computer-based exam consisting of 210 multiple-choice questions and a clinical decision-making component with 38 cases with short-menu and short-answer questions.The blueprint for the MCCQE Part I consists of two elements with sub-scores for Dimensions of Care: (1) Health promotion and Illness prevention, (2) Acute, (3) Chronic, and (4) Psychosocial aspects; and Physician Activities: (1) Assessment and diagnosis, (2) Management, (3) Communication, and (4) Professional behaviours.The blueprint was implemented in 2018; a new score scale, 100 to 400, was established at that time [30].The MCCQE Part I is a co-requisite for IMG residency application, however, many candidates take the MCCQE Part I prior to attempting the NAC Examination as the MCCQE Part I is offered outside of Canada.We used this examination as criterion-related evidence for the NAC Examination as all candidates need to take both examinations for a residency application.
The data for the NAC Examination sample was merged with exam results for the MCCQE Part I taken in 2018 or later, resulting in 2,134 candidates having both exam scores.This matched sample was used to gather criterion-related validity evidence.The demographic variables include gender, type of IMG, previous PGT, and language.Threats to validity can occur when group differences are identified that cannot be explained.Gender, type of IMG, previous PGT and language are demographic variables that could support the valid score interpretation of NAC Examination total scores [9,19].As previous research indicates, gender, years of experience and language can lead to performance differences.We wanted to determine is this was also true for the NAC Examination.Gender was categorized as female or male based on information provided at registration.Type of IMG was categorized as Canadian Studying Abroad (CSA) and non-CSA; candidates were identified as CSAs if they indicated that they were Canadian citizens or permanent residents in Canada when entering medical school.All other candidates were categorized as non-CSA.The type of advanced education was categorized as PGT if the candidate had postgraduate training (all training was outside of Canada), and non-PGT if the candidate had no postgraduate training.Language was categorized as English or French based on the delivery language of the NAC Examination.

Statistical analyses and outcome measures Analyses
Several analyses were conducted to gather evidence to support the psychometric properties of the NAC Examination scores.Reliability evidence included Cronbach's alpha, decision consistency, and decision accuracy ranges for the September 2020 to 2021 exams.Validity evidence included comparisons of differences for total scores and sub-scores for demographic groups: gender (male vs. female), type of IMG (CSA vs. non-CSA), PGT (no PGT vs. PGT), and the language for test administration (English vs. French).Criterion relationships were quantified through correlations of the total scores and sub-scores internal to the NAC Examination as well as with the MCCQE Part I.
Reliability analyses Cronbach's alpha was used to estimate score reliability for each test form.Cronbach's alpha indicates the desired consistency (or reproducibility) of exam scores across replications of measurement [31].
Estimates indicating the decision consistency (DC) and decision accuracy (DA) of pass/fail decisions were calculated using the Livingston and Lewis procedure [32].DC is an estimate of the agreement between classifications on potential parallel test forms; DA is the estimate of agreement between the observed classifications of candidates and those based on their true score (i.e., observed score ± measurement error).Ideally, both values should be high (i.e., 0.80 and above), suggesting reliable pass/fail classifications.
Validity analyses Several separate t-tests were completed based on total score differences on the NAC Examination for the following demographic variables: (1) gender, (2) type of IMG, (3) PGT, and (4) language.Since several t-tests were conducted, we adjusted the significance level from 0.05 to 0.01 to guard against type I error [33].For significant t-tests, Cohen's d was also calculated [34].Effect sizes of 0.20 are considered small, 0.50 medium, and 0.80 large.
Multivariate analyses of variance (MANOVA) were conducted for the same demographic analyses using the three NAC Examination sub-scores as the dependent variables: (1) Assessment and Diagnosis, (2) Management, and (3) Communication Skills.Analysis of variance (ANOVA) step-down tests were conducted for significant MANOVA results.We used a p < .01 for the MANOVAs to control for type I errors.
For criterion-related validity analyses, we calculated Pearson correlations between the NAC Examination total scores and sub-scores and their relationships with the MCCQE Part I total scores and sub-scores.

Results
The descriptive statistics for the NAC Examination sample of 2,433 are shown in Tables 1 and 2. Table 1 shows the number of candidates, mean, and standard deviation (SD) for the total scores and the three sub-scores.Table 2 shows the number of candidates, mean, and SD for the total scores and sub-scores for each of the four demographic groupings of gender, type of IMG, type of PGT, and language.

Reliability analyses
September 2020 Cronbach's alpha ranged from 0.63 to 0.71.October 2021 Cronbach's alpha ranged from 0.68 to 0.72.DC and DA estimates were also calculated by test form.September 2020 DC values ranged from 0.86

Validity analyses
The first set of analyses were t-tests for the total scores for the NAC Examination for 4 demographic groups defined above (see Table 1 for descriptive statistics).
For gender (based on unequal variances), the average score for women was significantly greater than that for men at the p < .01level, t (1925.The second set of analyses consisted of four separate MANOVAs (using Wilks' Lambda) where demographic groupings were the independent variables, and the 3 NAC Examination sub-scores (Assessment and Diagnosis, Management and Communication Skills) were the dependent variables (see Table 2 for descriptive statistics).The MANOVA for gender was statistically significant at the p < .01level, F (3, 2429) 31.42.All 3 step-down analyses for the separate sub-scores were statistically significant at p < .01level, with Communication Skills effect size was the largest (Cohen's d = 0.36), Assessment and Diagnosis effect size was the second largest (Cohen's d = 0.34), and Management was the smallest (Cohen's d = 0.25).For all 3 sub-scores, female candidates had higher average performance than male candidates.
The MANOVA for type of CSA was statistically significant at the p < .01level, F (3, 2429), 118.36.All stepdown analyses were statistically significant at the p < .01level, with Communication Skills effect size was the largest (Cohen's d = 0.47), Assessment and Diagnosis effect size was the second largest (Cohen's d = 0.17), and Management the smallest (Cohen's d = 0.13).The average Communication Skills sub-score was higher for those candidates who were CSAs; for Assessment and Diagnosis and Management candidates who were not CSAs had higher average sub-scores.
The MANOVA for type of PGT was statistically significant at the p < .01level, F (3, 2429), 68.38.All stepdown analyses were statistically significant at the p < .01level, with Communication Skills effect size was the largest (Cohen's d = 0.32), Assessment and Diagnosis effect size was the second largest (Cohen's d = 0.23), and Management was the smallest (Cohen's d = 0.13).The average Communication Skills sub-score was higher for those candidates without PGT; Assessment and Diagnosis and Management had higher average sub-scores for those candidates with PGT.The MANOVA for language was not statistically significant at the p < .01level, F (3, 2429), 0.30.
The criterion validity evidence is presented in Table 3  as

Discussion
The use of OSCEs for the certification and licensure of physicians' dates back 30 years [3].These performancebased assessments measure what candidates can do, albeit in a simulated environment.The COVID-19 pandemic forced many testing organizations to cancel or modify their performance assessments.For those organizations that modified their assessments, both in terms of administrative protocols and content, it is important to gather additional evidence to support the validity of any inferences based on examination scores.Given that other interruptions to testing could occur in the future, it is also important to document changes in administrative protocols, including those that may have some impact, both positive and negative, on the quality of the assessment.In this study, we gathered evidence to support the psychometric adequacy of the NAC Examination scores for administrations taken under COVID-19 conditions.The results of this study indicated that the NAC Examination had reliable total scores and pass/fail decisions.Moreover, expected differences in total scores and sub-scores for defined groups were consistent with previous literature [19,35].The internal relationships amongst NAC Examination sub-scores and their external relationships with the MCCQE Part I supported both discriminant and criterion-related validity arguments.Overall, the changes made to the NAC Examination do not represent threats to the validity of the score interpretations.
With the numerous modifications to the NAC Examination, including the touchless physical examination, other potential sources of measurement error come into play.However, the reliability estimates using Cronbach's alpha and DC and DA values for pass/fail decisions indicated that the scores are reliable and that consistent pass/ fail decisions are being made.Furthermore, the reliability estimates were similar to those found for OSCEs of similar length, and comparable to values observed on NAC Examination test forms administered prior to the COVID-19 pandemic [36].The DC and DA values were a bit higher than those found before COVID-19 administrations, but they can be influenced by both fewer candidates near the cut score and the overall reliability of the test form.For a decision/interpretation validity framework, the NAC Examination used similar approaches to establish the cut score before and after COVID-19, following best practices [28,37].Overall, the NAC Examination modifications yielded scores with acceptable levels of measurement error.
Although the modified NAC Examination scores were reliable, this does not provide concrete evidence that we are measuring the intended abilities.To investigate this, we compared the performances of defined groups.We found that, on average, females outperformed males on Communications skills.As has been documented in other studies, females tend to outperform males on clinical skills assessments, more so for communication [19,35].When looking at actual practice data, female physicians have been found to be, on average, better communicators and therefore more likely to obtain more relevant data from patients [38].We also found that CSAs had better Communication skills than non-CSAs.Since these individuals would have experience in the Canadian education system, one would expect their communication skills, which may be dependent on language proficiency, to be more advanced.It was interesting that non-CSAs, on average, had higher Assessment and Diagnosis and Management sub-scores.This may reflect the fact that non-CSAs have more clinical experience, some having completed part or all of their residency training programs.Evaluating PGT experience, as expected, candidates with prior PGT outperformed those who did not both in Assessment and Diagnosis and Management.Our final comparison looked at performance by language of administration.Given that the English and French NAC Examinations are constructed using the same blueprint to be of comparable difficulty, and there is no reason to believe that the English and French candidates have different abilities, our non-significant finding eliminates one potential threat to the validity of the test scores.In general, it would be expected that the scores on a written examination would not have very high correlations with a performance-based examination given the different competencies being demonstrated and evaluated.
There is one limitation to interpretation of these correlations, in that candidates challenging the MCCQE Part I may be taking the examination at a different point in time.These correlations may be higher if the two examinations were routinely taken in a short time frame.This may be due to the availability of the MCCQE Part I being available in up to 80 countries and the NAC Examination being offered only within Canada.Candidates who do not pass the MCCQE Part I may also forego taking the NAC Examination as their application for residency may not be competitive.The associations between scores provide some evidence to support the construct and criterion-related validity of the modified NAC Examination.Some organizations successfully offered virtual performance assessments, generally with smaller candidate numbers for final year medical school examinations [39][40][41][42][43][44].Often these assessments were simply an oral examination with small candidate numbers (under 50 candidates) conducted using communication applications such as Zoom or Microsoft Teams, where no physical examination and sometimes no history taking, or communication skills were assessed.Others encountered numerous challenges, potentially compromising the validity of the scores [45].Even for those organizations that successfully offered a "hands off " assessment, questions concerning the nature of the constructs being measured remain; for example, it is unclear if in-person communication is the same as communication over an electronic platform.Some organizations converted to a virtual platform with larger candidate volumes, but this came with delays and postponing of candidate examination results across several years [46,47].This study has outlined that with larger candidate numbers, modifications to existing in-person OSCEs were possible.

Future considerations
Physical examinations that were "hands-on" were reintroduced to the NAC Examination in May 2023 with masks during the encounters.During the administration of the modified NAC Examination, true assessment of physical examination skills was replaced by an assessment of clinical rationale for specific examination maneuvers.In the end, the assessment of why the candidate wanted to perform specific maneuvers was considered a net gain in that, going forward, the MCC is developing physical examination stations with a hybrid approach, involving a hands-on physical examination and augmenting it with some of the clinical reasoning facets used for the touchless version of the examination.It may be warranted to evaluate long-term outcomes on how the adjustments to the NAC Examination impacted the skills being measured on the adjusted NAC Examination from a program perspective.
Shifting all training, orientation and staff meetings to a virtual platform, with the ability to conduct site meetings more frequently, provided for a greater exchange of information and adoption of best practices.These meetings likely would not have been implemented outside of the urgency of reconfiguring the exam during the pandemic.The COVID-19 pandemic accelerated a positive shift to online educational meetings.The MCC will continue with these virtual sessions at peak preparation times.

Conclusions
While the modifications to the NAC Examination yielded reliable scores and pass/fail decisions, and some evidence to support their validity, the assessment was different, both in terms of administration and content.With respect to administration, the practice of medicine at the time was being carried out under PPE conditions, thus enhancing the fidelity of the assessment.Given that the reliability of the scores was similar to those for pre-COVID administrations, it is reasonable to presume that the online training of SPs and physician examiners was adequate.Finally, the "touchless" physical examination, a necessary modification at the time, cannot be used to measure a candidate's ability to evaluate objective anatomic findings through the use of palpation, percussion, and auscultation.It can, however, be employed to measure clinical reasoning related to physical examination skills, a necessary competency for the practice of medicine.All in all, the COVID-19 pandemic provided the impetus to make changes to OSCEs, many of which were positive.Going forward, the MCC and other organizations now have the expertise and knowledge base to appropriately modify their assessments should another health crisis or other type of examination interruption occur.
Gathering evidence to support the validity of examination scores, or any inferences we make based on the examination scores, is never complete [48].While we do provide some evidence to support the reliability and validity of the NAC Examination as administered under COVID-19 conditions, additional investigations are warranted.In moving to a "touchless" physical examination, a different construct is being measured.However, one would still expect that knowledge of which physical examination maneuvers were appropriate should be related to the ability to perform other physical examination skills.This could be studied in the future.Given the artificial nature of the simulation environment, it is important to know whether NAC Examination performance relates to performance with 'real' patients.While the relationships between OSCE performance and patient care have been studied elsewhere [19], this has not been done specifically for the NAC Examination.The MCC, like most testing organizations, is dedicated to providing data to support the valid use of its examinations.
the correlation coefficients for the total scores and sub-scores for both the NAC Examination and MCCQE Part I.All the correlations were statistically significant.The NAC Examination total score and sub-score correlations were large, and the individual sub-score correlations were small to moderate.The correlation between the NAC Examination total score and MCCQE Part I total score was moderate at r = .52;the NAC Examination total score and MCCQE Part I sub-score correlations were moderate for Health Promotion and Illness Prevention, Acute Care, Chronic Care, Assessment and Diagnosis, and Management.The correlations were lower for Psychosocial Aspects, Communication and Professional Behaviours.In general, The NAC Examination sub-scores were less highly associated with the MCCQE Part I Psychosocial Aspects, Communication, and Professional Behaviours sub-scores.The highest correlation (r = .49)was between the NAC Examination Assessment and Diagnosis sub-score and the MCCQE Part I Assessment and Diagnosis sub-score.

-
We also quantified the internal associations between the NAC score and sub-scores and their relationships with the scores for an assessment measuring different constructs (MCCQE Part I).The highest criterion-related validity coefficient was between the NAC Examination total score and the Assessment and Diagnosis sub-score for the NAC Examination.From a blueprint perspective, the Assessment and Diagnosis category is the most heavily weighted category, where approximately 70% of the exam content is allocated.The correlations between the NAC Examination sub-scores were moderately high, showing that there was some overlap in the constructs being measured.The highest correlations for the MCCQE Part I total score were with Assessment and Diagnosis and Acute Care and Management sub-scores of the NAC Examination.The range of correlations between NAC Examination scores and MCCQE Part I scores was expected given the blueprints for both exams share a fair amount of overlap [26, 29].There are, however, several unique sub-scores on the MCCQE Part I, such as Health Promotion and Illness Prevention, Psychosocial Aspects, and Professional Behaviours.The lower correlations of NAC Examination scores with these dimensions indicate that the two assessments measure different constructs, not unexpected given these constructs are not represented on the NAC Examination blueprint.While we found that NAC Examination Communication score was only moderately associated with the MCCQE Part I Communication sub-score, candidates taking the NAC Examination must demonstrate communication skills as opposed to knowing communication principles (MCCQE Part I).

Table 1
Descriptive statistics of the NAC Examination, total scores and sub-scores

Table 2
Descriptive statistics for gender, type of IMG, PGT training, and language

Table 3
Correlations for NAC Examination and MCCQE Part I total scores and sub-scores